Telecom Churn Classification Analysis (EDA)

Any business wants to maximize the number of customers. To achieve this goal, it is important not only to try to attract new ones, but also to retain existing ones. Retaining a client will cost the company less than attracting a new one. In addition, a new client may be weakly interested in business services and it will be difficult to work with him, while old clients already have the necessary data on interaction with the service.

Accordingly, predicting the churn, we can react in time and try to keep the client who wants to leave. Based on the data about the services that the client uses, we can make him a special offer, trying to change his decision to leave the operator. This will make the task of retention easier to implement than the task of attracting new users, about which we do not know anything yet.

You are provided with a dataset from a telecommunications company. The data contains information about almost six thousand users, their demographic characteristics, the services they use, the duration of using the operator's services, the method of payment, and the amount of payment.

The task is to analyze the data and predict the churn of users (to identify people who will and will not renew their contract). The work should include the following mandatory items:

  1. Description of the data (with the calculation of basic statistics);
  2. Research of dependencies and formulation of hypotheses;
  3. Building models for predicting the outflow (with justification for the choice of a particular model) based on tested hypotheses and identified relationships;
  4. Comparison of the quality of the obtained models.
  1. customerID - customer id
  2. gender - client gender (male / female)
  3. SeniorCitizen - is the client retired (1, 0)
  4. Partner - is the client married (Yes, No)
  5. tenure - how many months a person has been a client of the company
  6. PhoneService - is the telephone service connected (Yes, No)
  7. MultipleLines - are multiple phone lines connected (Yes, No, No phone service)
  8. InternetService - client's Internet service provider (DSL, Fiber optic, No)
  9. OnlineSecurity - is the online security service connected (Yes, No, No internet service)
  10. OnlineBackup - is the online backup service activated (Yes, No, No internet service)
  11. DeviceProtection - does the client have equipment insurance (Yes, No, No 12. internet service)
  12. TechSupport - is the technical support service connected (Yes, No, No internet service)
  13. StreamingTV - is the streaming TV service connected (Yes, No, No internet service)
  14. StreamingMovies - is the streaming cinema service activated (Yes, No, No internet service)
  15. Contract - type of customer contract (Month-to-month, One year, Two year)
  16. PaperlessBilling - whether the client uses paperless billing (Yes, No)
  17. PaymentMethod - payment method (Electronic check, Mailed check, Bank transfer (automatic), Credit card (automatic))
  18. MonthlyCharges - current monthly payment
  19. TotalCharges - the total amount that the client paid for the services for the entire time
  20. Churn - whether there was a churn (Yes or No)

Loading Libraries

Loading Dataset

Exploratory Data Analysis

Sumarizing the dataset

Statistical tests

For each of the following tests, we'll reject the null hypothesis if p-value > 0.05.

The null hypothesis for each of the following tests is

H0: The two tested variables are independent.

Since p-value is 0.0 then we reject the null hypothesis, they aren't independent variables.

We can conclude that features:

-OnlineSecurity

-OnlineBackup

-DeviceProtection

-TechSupport

-StreamingTV

-StreamingMovies

and InternetService are dependent fully with same p-value =0.0

p-value is not greater than 0.05, so we reject the null hypothesis. Therefore, InternetService and Contract aren't independent features.

p-value is not greater than 0.05, so we reject the null hypothesis. Therefore, InternetService and PhoneService aren't independent features.

p-value is not greater than 0.05, so we reject the null hypothesis. Therefore, InternetService and MultipleLines aren't independent features.

Visualizations

There is some pattern, tenure is proportional with TotalCharges. However, we see that small gap between tenure's values [45,60+]

Here we can see that customers that pay higher than average rates tend to leave the company more rather than customers that pay less than average.

From the data above it seems that people with longer tenure renew their contracts much more.

The data above shows that custumers with month-to-month contracts tend to churn much more, where as custumers with two year contracts leave the company the least.